5.1 EMP_collapse

Data collapse refers to merging the original assay to generate a new assay. EMP_collapse can work based on the specified data in rowdata or coldata. Users can calculate the average, median, sum, maximum, or minimum as the combined result during the folding process.

5.1.1 Merge assays based on duplicate coldata

EMP_collapse can collapse the assay according to the annotation in the coldata.

🏷️Example: Merge the assay data according to the column Group in the coldata.

Original assay：

MAE |>
  EMP_assay_extract(experiment='taxonomy')

Folded assay:

Extract the assay from taxonomy experiment. The parameter estimate_group specifies that the data should be aggregated based on the Group column in coldata. The parameter collapse_by indicates that aggregation should be performed according to the col column in coldata. The parameter collapse_sep specifies that the + delimiter should be used to concatenate the string cells in the coldata where aggregation occurs.

MAE |>
  EMP_assay_extract(experiment='taxonomy') |>
  EMP_collapse(collapse_by='col',estimate_group = 'Group',
               method = 'mean',collapse_sep = '+')

Folded coldata:

Note：
Although the data aggregation operation essentially involves collapsing the assay, the assay, coldata, and rowdata are intrinsically linked from the moment the MAE object is created. Therefore, during the collapse process, the coldata is also aggregated accordingly. For continuous variables in coldata, the aggregation method used will be the same as that for the assay. For categorical variables in coldata, the specified collapse_sep delimiter will be used to concatenate the data. For boolean/logical variables, the parameter collapse_boolean will be used to concatenate the data.

MAE |>
  EMP_assay_extract(experiment='taxonomy') |>
  EMP_collapse(collapse_by='col',estimate_group = 'Group',
               method = 'mean',collapse_sep = '+') |>
  EMP_coldata_extract()

5.1.2 Merge assays based on duplicate rowdata

EMP_collapse can collapse the assay according to the annotation in the rowdata.

🏷️Example 1: Merge the assay data according to the column Class in the rowdata.

In microbiome studies, the EMP_collapse function can rapidly aggregate the features of the assay from higher taxonomic levels (such as Phylum or Genus) to lower levels by specifying the estimate_group parameter.

Note：
①After collapsing the assay based on rowdata, the feature of the assay will change, but the sample remains still. In this example, features originally at the species level are aggregated to the Class level.
②During the microbial data in this process, conflicts may arise between complete-level annotation and single-level annotation. Please refer to the Microbial Data Full Annotation Issue for more details.
③In this process, this module will automatically detect the aforementioned conflicts and prompt the user to choose between complete-level annotation and single-level annotation. Since the module defaults to automatic caching during the process, if you need to make a new selection, you can set use_cached = FALSE and then re-run the process.

MAE |>
  EMP_assay_extract(experiment='taxonomy') |>
  EMP_collapse(collapse_by='row',estimate_group = 'Class',
               method = 'sum',collapse_sep = '+')

🏷️Example 2: Merge the assay metabolite data according to rowdata.

注意：
①In metabolomics projects, metabolites often have multiple annotation levels. Users can quickly aggregate the assay data according to the annotation level of metabolites in rowdata.
②The features used for aggregation must be present in rowdata (for example, the assay in the geno_ko project cannot be collapsed according to the Class column in rowdata). Users can use the function EMP_rowdata_extract to view the feature annotation available for aggregation.

Original assay：

MAE |>
  EMP_assay_extract(experiment = 'untarget_metabol')

Folded assay: Merge the assay data according to the column MS2kegg in the rowdata.

Note：
In some metabolomics raw data, symbols such as NA, null, and - are commonly used to represent missing values. The parameter na_string allows users to specify these strings as missing values. During data aggregation, any rows containing missing values in the specified features for merging will be ignored.

MAE |>
  EMP_assay_extract(experiment = 'untarget_metabol') |>
  EMP_collapse(estimate_group = 'MS2kegg',collapse_by='row',
               na_string = c("NA", "null", "","-"),
               method = 'mean',collapse_sep = '+')

5.1 EMP_collapse

5.1 EMP_collapse

5.1.1 Merge assays based on duplicate coldata

5.1.2 Merge assays based on duplicate rowdata

results matching ""

No results matching ""